Introduction -- About Computer Vision¶

In the realm of artificial intelligence, computer vision stands as a cornerstone, allowing machines to interpret and understand visual information much like the human eye. This transformative technology has far-reaching implications across diverse fields, from healthcare and automotive industries to security and entertainment.

The Power of Computer Vision¶

Computer vision empowers machines to perceive, interpret, and make decisions based on visual data. Its applications range from facial recognition and object detection to autonomous vehicles and medical image analysis. By harnessing the capabilities of computer vision, we can unlock unprecedented insights from the vast amount of visual information available in the world.

The Significance of Image Recognition¶

Within the expansive domain of computer vision, image recognition plays a pivotal role. This task involves training machines to identify and classify objects or patterns within images, mimicking the human ability to recognize familiar entities. This task, even though it may appear simple, has been an important factor in computer vision for a long time, presenting a surprisingly hard problem for computers. One interesting and practical application of image recognition is discerning between images of dogs and cats—a seemingly simple task for humans but a complex challenge for machines.

Dog vs Cat Recognition Project¶

This project delves into the intriguing world of image recognition, specifically focusing on distinguishing between images of dogs and cats. Through the lens of deep learning models, in particular convolutional neural networks, we aim to develop a robust and accurate classifier capable of correctly identifying whether an uploaded image features a cat or a dog.

This project uses the Kaggle dataset: Cats and Dogs Breeds Classification Oxford Dataset available at https://www.kaggle.com/datasets/zippyz/cats-and-dogs-breeds-classification-oxford-dataset. Additionally, a version of this notebook is available on my Kaggle profile: https://www.kaggle.com/code/alinagdartmouth/dog-cat-breed.

In [2]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
In [3]:
# Download the images from the data set
from PIL import Image

data_path = '/kaggle/input/cats-and-dogs-breeds-classification-oxford-dataset/images/images'
image_list = os.listdir(data_path)

Handling Incorrect Image Format¶

During the preprocessing phase of the Dog vs Cat classification project, it was identified that three images in the dataset were in an incorrect format—MATLAB files. MATLAB files are not compatible with the image processing pipeline we employed, which expects standard image formats like JPEG or PNG.

To maintain the integrity of the dataset and ensure a smooth training process, we decided to remove these three images. This step helps prevent potential disruptions in the processing flow and ensures that the dataset consists only of images suitable for the machine learning model.

By identifying and addressing issues like incompatible file formats early in the project, we maintain a clean and well-prepared dataset, setting the stage for effective model training and accurate classification results.

In [4]:
# Remove images in the wrong format
import re
numbers = r'[0-9]'
for item in image_list:
    if '.mat' in item:
        image_list.remove(item)

Handling Image Variability and Dataset Structuring¶

Since the image data does not have the same size or scale, a two-step preprocessing strategy was implemented. First, the original images converted to matrices and flattened for further analysis of the image size. Secondly, white borders were introduced to achieve images in consistent format while preserving scales. Both these features were added as columns to a data frame. Leveraging that file names were based on breed and class, these were added as columns to a dataframe.

This methodical preprocessing lays the groundwork for a robust machine learning model that requires consistent image scale and size. The consistent dataset structure facilitates effective feature extraction, setting the stage for model training.

In [5]:
image_names = []
breed_names = []
class_names = []
images = []
images_scaled = []
flattened = []
cat_images = []
dog_images = []
cat_breeds = []
dog_breeds = []

count = 0
for image_name in image_list:
    # Extract breed from the image name
    breed = re.sub(numbers, '', image_name)
    breed = breed.replace('_','')
    breed = breed.replace('.jpg','')
    
    # Test if cat or dog (capital first letter)
    dog_or_cat = (breed[0].isupper())*'Cat' + (not breed[0].isupper())*'Dog'

    # Read image pixels
    image_path = os.path.join(data_path, image_name)
    img = Image.open(image_path)
    image_pixels = np.array(img)
    
    # Set the desired size for the smaller dimension
    target_size = (224, 224)  # Adjust the size as needed
    img.thumbnail(target_size, Image.LANCZOS)

    # Create a new blank image with the target size
    new_img = Image.new("RGB", target_size, (255, 255, 255))
    new_img.paste(img, ((target_size[0] - img.size[0]) // 2, (target_size[1] - img.size[1]) // 2))
    
    image_array = np.array(new_img).reshape(-1, 3)
    
    # Reshape the array to a flat list of RGB tuples

    if count <= 1000:
        flattened_pixels = image_pixels.reshape(-1, 3)
        flattened.append(flattened_pixels)
        count += 1


    # Append data to the dataframe
    image_names.append(image_name)
    breed_names.append(breed)
    class_names.append(dog_or_cat)
    images.append(image_pixels)
    images_scaled.append(image_array)
    if dog_or_cat == 'Cat' and breed in ['Abyssinian','Bombay','BritishShorthair','MaineCoon']:
        cat_images.append(image_array)
        cat_breeds.append(breed)
    elif breed in ['wheatenterrier','beagle','greatpyrenees','pomeranian']:
        dog_images.append(image_array)
        dog_breeds.append(breed)
In [6]:
len(image_names)
Out[6]:
7390
In [7]:
# dictionary of lists 
dict = {'Image_Name': image_names, 'Breed': breed_names, 'Class': class_names, 'Image_Pixels': images, 'Image_Scaled': images_scaled} 
# Create data frame
df_images = pd.DataFrame(dict)
In [9]:
# dictionary of lists 
dict = {'Breed': cat_breeds, 'Image_Scaled': cat_images} 
# Create data frame
df_cats = pd.DataFrame(dict)

# dictionary of lists 
dict = {'Breed': dog_breeds, 'Image_Scaled': dog_images} 
# Create data frame
df_dogs = pd.DataFrame(dict)

Exploring the Dataset Through Visualization¶

Cats and Dogs¶

To gain a visual understanding of the dataset, the first nine images were plotted. This exploratory step provides an initial glimpse into the data set, offering insights into the characteristics that distinguish cats from dogs.

In [11]:
import matplotlib.pyplot as plt
# Function to display images and their information
def display_images(df, num_images=9):
    fig, axes = plt.subplots(3, 3, figsize=(10, 10))
    fig.suptitle('Sample Images and Information', fontsize=16)

    for i in range(num_images):
        image_name = df.loc[i, 'Image_Name']
        breed_label = df.loc[i, 'Breed']
        class_label = df.loc[i, 'Class']
        image_array = df.loc[i, 'Image_Scaled']

        # Convert the image_array to a NumPy array
        image_array = np.array(image_array).reshape(224, 224, 3)

        # Display the image and information
        ax = axes[i // 3, i % 3]
        ax.imshow(image_array)
        ax.axis('off')
        ax.set_title(f'Breed: {breed_label}\nClass: {class_label}')

    plt.show()

# Display the first nine images
display_images(df_images)

Image Size¶

Following the initial exploration, the sizes of the original images were systematically plotted. This step aims to provide an overview of the varied dimensions present in the dataset, facilitating a deeper understanding of the diverse visual content. Note that, while there is some variation, on average the images are more or less square.

In [12]:
import seaborn as sns
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
sns.histplot(df_images['Image_Pixels'].apply(lambda x: x.shape[0]), bins=30, kde=False)
plt.title('Distribution of Image Heights')
plt.xlabel('Height')

plt.subplot(1, 2, 2)
sns.histplot(df_images['Image_Pixels'].apply(lambda x: x.shape[1]), bins=30, kde=False)
plt.title('Distribution of Image Widths')
plt.xlabel('Width')

plt.tight_layout()
plt.show()

plt.scatter(df_images['Image_Pixels'].apply(lambda x: x.shape[0]),df_images['Image_Pixels'].apply(lambda x: x.shape[1]))
plt.title('Scatter Plot of image heights and width')
plt.xlabel('Height')
plt.ylabel('Width')
/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.24.3
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
Out[12]:
Text(0, 0.5, 'Width')

Class and Breed Distribution¶

In further exploration, we examined the distribution of classes (cats vs. dogs) and breeds within the dataset. The results indicated a well-balanced distribution among breeds. However, in terms of classes, there was an imbalance, with approximately twice as many instances of dogs compared to cats.

In [13]:
plt.figure(figsize=(8, 8))
df_images['Class'].value_counts().plot.pie(autopct='%1.1f%%', labels=['Dogs', 'Cats'], startangle=90)
plt.title('Distribution of Classes')
plt.show()
In [15]:
import seaborn as sns
import matplotlib.pyplot as plt

# Plot distribution of Breed Labels
plt.figure(figsize=(12, 6))
sns.countplot(x='Breed', data=df_images, order=df_images['Breed'].value_counts().index)
plt.title('Distribution of Breed Labels')
plt.xlabel('Breed Label')
plt.ylabel('Count')
plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for better visibility
plt.show()

Color Distribution¶

Subsequently, the distribution of colors within the images was visualized. Notably, a divergence in color intensity emerged during this analysis. On the RGB scale (ranging from 0 to 255), cat images exhibit more black colors than dog images.

In [16]:
# dictionary of lists 
dict = {'Class': class_names[:1001], 'Image_Flattened': flattened} 
# Create data frame
df_flat = pd.DataFrame(dict)
In [17]:
# Plot color distribution
plt.figure(figsize=(16, 6))
for i, class_label in enumerate(['Dog', 'Cat']):
    class_df = df_flat[df_flat['Class'] == class_label]
    rgb_values = np.concatenate(class_df['Image_Flattened'].apply(lambda x: np.array(x)).values)

    plt.subplot(1, 2, i + 1)
    plt.title(f'Color Distribution for {class_label}')
    plt.xlabel('RGB Value')
    plt.ylabel('Frequency')
    plt.hist(rgb_values, bins=30, color=['red', 'green', 'blue'], alpha=0.7, label=['Red', 'Green', 'Blue'])
    plt.legend()

plt.tight_layout()
plt.show()

Unique Breed Counts¶

To ensure data consistency, we looked at the number of dog and cat breeds. The examination confirmed a higher count of dog breeds (25) compared to cat breeds (12).

In [22]:
import seaborn as sns
import matplotlib.pyplot as plt

# Filter the dataframe for dogs and cats
dogs_df = df_images[df_images['Class'] == 'Dog']
cats_df = df_images[df_images['Class'] == 'Cat']

# Get unique breeds for dogs and cats
unique_dog_breeds = dogs_df['Breed'].nunique()
unique_cat_breeds = cats_df['Breed'].nunique()

print(f"Number of unique dog breeds: {unique_dog_breeds}")
print(f"Number of unique cat breeds: {unique_cat_breeds}")

# Plot the number of unique breeds
plt.figure(figsize=(10, 6))
sns.barplot(x=['Dogs', 'Cats'], y=[unique_dog_breeds, unique_cat_breeds])
plt.title('Number of Unique Breeds for Dogs and Cats')
plt.xlabel('Animal Class')
plt.ylabel('Number of Unique Breeds')
plt.show()
Number of unique dog breeds: 25
Number of unique cat breeds: 12
/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1765: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  order = pd.unique(vector)

PCA Analysis¶

In an attempt to uncover underlying patterns within the data, Principal Component Analysis (PCA) was applied. Initial observations revealed the presence of two distinct groups, although no clear demarcation between dogs and cats, or different breeds, was evident.

In [23]:
from sklearn.decomposition import PCA

# Flatten each image
flattened_images = [img.flatten() for img in images_scaled]

# Convert the list of flattened images to a 2D array
image_data = np.array(flattened_images)

# Apply PCA to reduce dimensionality to 2 components for visualization
pca = PCA(n_components=2)
reduced_data = pca.fit_transform(image_data)

# Create a scatter plot of the reduced data
plt.figure(figsize=(10, 8))
sns.scatterplot(x=reduced_data[:, 0], y=reduced_data[:, 1], hue=df_images['Class'], palette='viridis')
plt.title('PCA for Image Visualization')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.show()

For further analysis, we plotted some images with their position based on the PCA. Here, we see that while PCA can recognize some visual similarities between images (like Image 4 and 10 or 2 and 7 below), two dimensions are not sufficient to distinguish clearly between breeds or classes. For example Image 5 and 2 appear to be very distinct, but are fairly close based on the first two principle components.

In [24]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.decomposition import PCA

# Assuming image_pixels is a list of 3D arrays representing images
# Flatten each image
flattened_images = [img.flatten() for img in images_scaled]

# Convert the list of flattened images to a 2D array
image_data = np.array(flattened_images)

# Apply PCA to reduce dimensionality to 2 components for visualization
pca = PCA(n_components=3)
reduced_data = pca.fit_transform(image_data)

# Plot 10 images
num_images_to_plot = 10
plt.figure(figsize=(15, 5))
for i in range(num_images_to_plot):
    plt.subplot(2, 5, i + 1)
    plt.imshow(np.reshape(image_data[i+100], (224,224,3)))
    plt.title(f'Image {i + 1}')
    plt.axis('off')

# Create a scatter plot with markers for the selected images
plt.figure(figsize=(10, 8))
plt.scatter(reduced_data[:, 0], reduced_data[:, 1], alpha=0.5, label='PCA Points')

# Mark the positions of the selected images with numbers
for i in range(num_images_to_plot):
    plt.annotate(str(i + 1), (reduced_data[i, 0], reduced_data[i, 1]), color='red', fontsize=10)

plt.title('PCA for Image Visualization with Selected Images')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.show()

To delve deeper into potential patterns within specific breeds, PCA analysis was performed individually for various dog and cat breeds. However, this detailed examination did not yield a clearer distinction or recognizable clustering based on breed characteristics.

In [19]:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.decomposition import PCA

# Flatten each image
flattened_cat = [img.flatten() for img in cat_images]
flattened_dog = [img.flatten() for img in dog_images]

# Convert the list of flattened images to a 2D array
cat_image_data = np.array(flattened_cat)
dog_image_data = np.array(flattened_dog)

# Apply PCA for cats
pca_cat = PCA(n_components=2)
reduced_data_cat = pca_cat.fit_transform(cat_image_data)

# Plot the results
plt.figure(figsize=(15, 5))

# Plot PCA for cats
plt.subplot(1, 2, 1)
sns.scatterplot(x=reduced_data_cat[:, 0], y=reduced_data_cat[:, 1], hue=df_cats['Breed'], palette='viridis')
plt.title('PCA for Cat Breeds')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()

# Apply PCA for dogs
pca_dog = PCA(n_components=2)
reduced_data_dog = pca_dog.fit_transform(dog_image_data)

# Plot PCA for cats
plt.subplot(1, 2, 2)
sns.scatterplot(x=reduced_data_dog[:, 0], y=reduced_data_dog[:, 1], hue=df_dogs['Breed'], palette='viridis')
plt.title('PCA for Dog Breeds')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()

plt.tight_layout()
plt.show()

t-SNE Analysis for Subtle Patterns¶

In addition to PCA, t-SNE analysis was conducted to explore potential subtleties in the data structure. However, similar to the PCA results, the t-SNE plots also revealed a split into two distinct groups. Although less pronounced, this consistent observation raises questions about the underlying factors contributing to this dichotomy. This is some direction for future work on this.

In [20]:
from sklearn.manifold import TSNE

# Apply t-SNE for cats
tsne_cat = TSNE(n_components=2, random_state=42)
tsne_result_cat = tsne_cat.fit_transform(cat_image_data)

# Create a DataFrame for t-SNE results
tsne_df_cat = pd.DataFrame(tsne_result_cat, columns=['Dimension 1', 'Dimension 2'])
tsne_df_cat['Breed'] = df_cats['Breed']  # Add class information to the DataFrame

# Apply t-SNE for dogs
tsne_dog = TSNE(n_components=2, random_state=42)
tsne_result_dog = tsne_dog.fit_transform(dog_image_data)

# Create a DataFrame for t-SNE results
tsne_df_dog = pd.DataFrame(tsne_result_cat, columns=['Dimension 1', 'Dimension 2'])
tsne_df_dog['Breed'] = df_dogs['Breed']  # Add class information to the DataFrame

# Plot the results
plt.figure(figsize=(15, 5))

# Plot t-SNE for cats
plt.subplot(1, 2, 1)
sns.scatterplot(x='Dimension 1', y='Dimension 2', hue='Breed', data=tsne_df_cat, palette='viridis')
plt.title('t-SNE Visualization')


# Plot t-SNE for dogs
plt.subplot(1, 2, 2)
sns.scatterplot(x='Dimension 1', y='Dimension 2', hue='Breed', data=tsne_df_dog, palette='viridis')
plt.title('t-SNE Visualization')


plt.tight_layout()
plt.show()

Insights from Outlier Detection¶

We used an isolation forest to further investigate the data set. All flagged outliers were valid images of cats or dogs, affirming the quality of our data. Remarkably, the algorithm not only identified images that that are situated between the two distinct groups discovered in the PCA.

Notable, certain breeds are more prone to outlier classification. The Sphynx cat, known for its hairless appearance, had 27 outliers, most of all breeds. Dog breeds at size extremes, like large Great Pyrenees and tiny Chihuahua, also exhibited distinct characteristics, making them more susceptible to outlier classification.

Another thing to note here is that some images identified as outliers showed images of puppies or slightly edited images. Since these are technically valid images, we decided to keep them in the data set.

In [25]:
from sklearn.ensemble import IsolationForest

# Assuming umap_result is the result obtained from UMAP
# You might want to use the original feature matrix instead of the UMAP result
# Assuming image_matrix is your data matrix

# Fit Isolation Forest
isolation_forest = IsolationForest(contamination=0.05, random_state=42)
outlier_labels = isolation_forest.fit_predict(image_data)

outlier_labels_plot = ['Outlier' if outlier == -1 else 'Not Outlier' for outlier in outlier_labels]

# Visualize outliers in UMAP space
plt.figure(figsize=(10, 8))
sns.scatterplot(x=reduced_data[:, 0], y=reduced_data[:, 1], hue=outlier_labels_plot, palette='viridis')
plt.title('Outlier Detection with Isolation Forest')
plt.xlabel('PCA Dimension 1')
plt.ylabel('PCA Dimension 2')
plt.show()

Outliers are data points that are part of neither of the two groups of images that the PCA distinguishes.

In [26]:
# Assuming outlier_labels is the result obtained from Isolation Forest
outlier_indices = np.where(outlier_labels == -1)[0]

# Assuming image_paths is a list of file paths to the original images
# Adjust this based on how you have stored or can retrieve your images
image_paths = [data_path + '/' + k for k in df_images['Image_Name']]  # Replace with your actual image paths

# Visualize all images of outliers
plt.figure(figsize=(15, 25))
for i in range(10):
    plt.subplot(10, 5, i*5 + 1)
    plt.axis('off')
    plt.title(f'Line {i+1}')

    for j in range(1, 6):
        idx = 100+ i * 5 + j - 1
        if idx < len(outlier_indices):
            plt.subplot(10, 5, i*5 + j)
            
            # Load and plot the image
            img = Image.open(image_paths[outlier_indices[idx]])
            plt.imshow(img)
            plt.title(f'Outlier {outlier_indices[idx]+1}: ' +df_images['Breed'][outlier_indices[idx]])
            plt.axis('off')

plt.tight_layout()
plt.show()
In [27]:
from collections import Counter

# Assuming df_images is your dataframe with image information
# Assuming outlier_indices is the result obtained from Isolation Forest
outlier_breeds = df_images.loc[outlier_indices, 'Breed']

# Count the occurrences of each breed among outliers
breed_outlier_counts = Counter(outlier_breeds)

# Assuming breed_outlier_counts is the Counter with breed outlier counts
breeds, counts = zip(*sorted(breed_outlier_counts.items(), key=lambda x: x[1], reverse=True))

# Plotting
plt.figure(figsize=(12, 6))

# Mark breeds with more than 15 outliers in red
colors = ['red' if count > 20 else 'skyblue' for count in counts]

plt.bar(breeds, counts, color=colors)
plt.xlabel('Breed')
plt.ylabel('Number of Outliers')
plt.title('Number of Outliers per Breed (Ordered)')
plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for better visibility
plt.tight_layout()
plt.show()

Data Augmentation¶

Addressing Data Imbalance¶

To mitigate the imbalance between cat and dog images, we employed an image generator with diverse transformations, such as rotation, shear, zoom, and horizontal flip. These augmentations significantly expanded our cat image pool, fostering a more balanced representation between the two classes.

The augmentation strategy not only alleviates class distribution disparities but also enhances the model's ability to generalize across varied data scenarios. The resultant dataset, now conducive to unbiased training, lays the groundwork for improved dog vs. cat classification.

This proactive approach to data imbalance exemplifies the importance of preprocessing techniques in optimizing model performance and ensuring equitable representation across diverse classes.

In [28]:
import os

# Specify the output directory
output_dir = '/kaggle/working/augmented_images'

# Create the directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)
In [29]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Create an ImageDataGenerator for data augmentation
datagen = ImageDataGenerator(
    rotation_range=40,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

# Reshape the cat image data to the original shape (assuming it was flattened)
cat_images_original_shape = np.array(cat_images).reshape((-1, target_size[0], target_size[1], 3))

# Specify the directory to save augmented images (adjust as needed)
save_dir = '/kaggle/working/augmented_images'

# Create augmented images and save them to the specified directory
for i, cat_image in enumerate(cat_images_original_shape):
    cat_image = np.expand_dims(cat_image, axis=0)
    for j, batch in enumerate(datagen.flow(cat_image, batch_size=1, save_to_dir=save_dir, save_prefix=f'cat_aug_{i}', save_format='jpg')):
        if j >= 1:  # Generate 1 augmented images for each original image
            break
In [30]:
import os
from PIL import Image

# Path to the directory containing augmented images
augmented_images_dir = '/kaggle/working/augmented_images'

# List to store augmented image data
augmented_images_data = []

# Loop through files in the directory
for filename in os.listdir(augmented_images_dir):
    # Create the full path to the file
    file_path = os.path.join(augmented_images_dir, filename)
    
    # Load the image
    img = Image.open(file_path)
    
    # Convert the image to a numpy array and flatten
    image_array = np.array(img).reshape(-1, 3)
    
    # Append the flattened image to the list
    augmented_images_data.append(image_array)
In [31]:
import numpy as np

# Convert the original images_scaled list to a numpy array
original_data = np.array(images_scaled)

# Convert the augmented_images_data list to a numpy array
augmented_data = np.array(augmented_images_data)

# Concatenate the original and augmented data
combined_data = np.concatenate((original_data, augmented_data), axis=0)
In [32]:
# Create an additional labels array
class_augmented = [0]*len(augmented_data)

Neural Network building¶

Class Encoding¶

To prepare our data for modeling, we applied a label encoder to systematically encode the categorical classes corresponding to cats and dogs. This encoding step ensures a numeric representation of classes, enabling seamless integration with machine learning models.

The label encoder plays a crucial role in standardizing class labels, facilitating effective model training and prediction. This streamlined data format enhances model interpretability and performance, marking a fundamental step in our journey toward accurate dog vs. cat classification.

In [33]:
from sklearn.preprocessing import LabelEncoder

# Sample data
classes = df_images['Class']

# Initialize LabelEncoder
label_encoder = LabelEncoder()

# Fit and transform the data
encoded_classes = label_encoder.fit_transform(classes)

Train-Test Split¶

Following data augmentation, we strategically partitioned our dataset into training and testing subsets. This division enables rigorous model training on one segment while validating its performance on an independent set. This ensures an unbiased evaluation, a cornerstone for building a reliable and accurate classification model.

In [34]:
# Split data into training and testing data
from sklearn.model_selection import train_test_split

labels = encoded_classes.tolist()+class_augmented

images_vgg = [img.reshape(224,224,3) for img in combined_data]

# Split the data into training and testing sets
X_train, X_val, y_train, y_val = train_test_split(images_vgg, labels, test_size=0.2, random_state=42)

ResNet50¶

For this project, we are using ResNet50. ResNet50, short for Residual Network with 50 layers, represents a milestone in deep learning architecture. Developed by Microsoft Research, it introduces a groundbreaking concept: residual learning.

Traditional deep neural networks faced challenges in training very deep architectures due to the vanishing gradient problem. ResNet50 addresses this by introducing residual blocks. Each block contains a shortcut connection, allowing the model to skip one or more layers during training. This innovation facilitates the training of extremely deep networks, promoting better convergence and enabling the successful training of ResNet50 with 50 layers.

ResNet50 has proven highly effective in various computer vision tasks, including image classification. Its ability to capture intricate features and nuances in images has made it a popular choice for tasks requiring deep convolutional neural networks.

In [ ]:
from tensorflow.keras.applications import ResNet50


# Load ResNet50 model with pre-trained weights
model_resnet50 = ResNet50(weights='imagenet')

# Visualize the model architecture and save it to a file
plot_model(model_resnet50, to_file='resnet50.png', show_shapes=True, show_layer_names=True)
In [35]:
import tensorflow as tf
from tensorflow.keras import layers, models, callbacks
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.optimizers import Adam


# Load VGG16 model with pre-trained weights
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Freeze the convolutional layers
for layer in base_model.layers:
    layer.trainable = False

# Create a custom model on top of VGG19
model_resnet50 = models.Sequential([
    base_model,
    layers.Flatten(),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(1, activation='sigmoid'),
])

# Compile the model
model_resnet50.compile(optimizer='adam',
              loss='binary_crossentropy',  
              metrics=['binary_accuracy'])

# Display the model summary
model_resnet50.summary()

# Early stopping callback
early_stopping = callbacks.EarlyStopping(
    monitor='val_binary_accuracy',  # Stop training when the validation accuracy doesn't improve
    patience=10,           # Number of epochs with no improvement after which training will be stopped
    restore_best_weights=True  # Restore model weights from the epoch with the best value of the monitored quantity
)

# Fit the model to the training data
history_resnet50 = model_resnet50.fit(np.array(X_train), np.array(y_train), validation_data = [np.array(X_val),np.array(y_val)],epochs=100, batch_size=256,callbacks=[early_stopping])
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
94765736/94765736 [==============================] - 0s 0us/step
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 resnet50 (Functional)       (None, 7, 7, 2048)        23587712  
                                                                 
 flatten (Flatten)           (None, 100352)            0         
                                                                 
 dense (Dense)               (None, 256)               25690368  
                                                                 
 dropout (Dropout)           (None, 256)               0         
                                                                 
 dense_1 (Dense)             (None, 1)                 257       
                                                                 
=================================================================
Total params: 49278337 (187.98 MB)
Trainable params: 25690625 (98.00 MB)
Non-trainable params: 23587712 (89.98 MB)
_________________________________________________________________
Epoch 1/100
29/29 [==============================] - 28s 548ms/step - loss: 9.1796 - binary_accuracy: 0.8108 - val_loss: 0.2032 - val_binary_accuracy: 0.9666
Epoch 2/100
29/29 [==============================] - 13s 446ms/step - loss: 0.1287 - binary_accuracy: 0.9645 - val_loss: 0.0533 - val_binary_accuracy: 0.9800
Epoch 3/100
29/29 [==============================] - 13s 446ms/step - loss: 0.0522 - binary_accuracy: 0.9861 - val_loss: 0.0476 - val_binary_accuracy: 0.9816
Epoch 4/100
29/29 [==============================] - 13s 446ms/step - loss: 0.0380 - binary_accuracy: 0.9900 - val_loss: 0.0485 - val_binary_accuracy: 0.9828
Epoch 5/100
29/29 [==============================] - 13s 437ms/step - loss: 0.0288 - binary_accuracy: 0.9930 - val_loss: 0.0503 - val_binary_accuracy: 0.9816
Epoch 6/100
29/29 [==============================] - 13s 438ms/step - loss: 0.0251 - binary_accuracy: 0.9936 - val_loss: 0.0507 - val_binary_accuracy: 0.9822
Epoch 7/100
29/29 [==============================] - 13s 438ms/step - loss: 0.0216 - binary_accuracy: 0.9953 - val_loss: 0.0522 - val_binary_accuracy: 0.9822
Epoch 8/100
29/29 [==============================] - 13s 447ms/step - loss: 0.0208 - binary_accuracy: 0.9954 - val_loss: 0.0551 - val_binary_accuracy: 0.9839
Epoch 9/100
29/29 [==============================] - 13s 437ms/step - loss: 0.0210 - binary_accuracy: 0.9961 - val_loss: 0.0544 - val_binary_accuracy: 0.9833
Epoch 10/100
29/29 [==============================] - 13s 438ms/step - loss: 0.0193 - binary_accuracy: 0.9972 - val_loss: 0.0567 - val_binary_accuracy: 0.9828
Epoch 11/100
29/29 [==============================] - 13s 438ms/step - loss: 0.0163 - binary_accuracy: 0.9972 - val_loss: 0.0591 - val_binary_accuracy: 0.9822
Epoch 12/100
29/29 [==============================] - 13s 437ms/step - loss: 0.0164 - binary_accuracy: 0.9969 - val_loss: 0.0573 - val_binary_accuracy: 0.9833
Epoch 13/100
29/29 [==============================] - 13s 438ms/step - loss: 0.0169 - binary_accuracy: 0.9974 - val_loss: 0.0620 - val_binary_accuracy: 0.9828
Epoch 14/100
29/29 [==============================] - 13s 438ms/step - loss: 0.0137 - binary_accuracy: 0.9976 - val_loss: 0.0577 - val_binary_accuracy: 0.9828
Epoch 15/100
29/29 [==============================] - 13s 437ms/step - loss: 0.0145 - binary_accuracy: 0.9986 - val_loss: 0.0614 - val_binary_accuracy: 0.9828
Epoch 16/100
29/29 [==============================] - 13s 437ms/step - loss: 0.0139 - binary_accuracy: 0.9981 - val_loss: 0.0616 - val_binary_accuracy: 0.9828
Epoch 17/100
29/29 [==============================] - 13s 437ms/step - loss: 0.0141 - binary_accuracy: 0.9986 - val_loss: 0.0635 - val_binary_accuracy: 0.9828
Epoch 18/100
29/29 [==============================] - 13s 448ms/step - loss: 0.0136 - binary_accuracy: 0.9983 - val_loss: 0.0655 - val_binary_accuracy: 0.9839
In [36]:
import pandas as pd

history_frame = pd.DataFrame(history_resnet50.history)
history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['binary_accuracy', 'val_binary_accuracy']].plot();

Misclassification Analysis¶

In the misclassification analysis, we investigate images that th model failed to accurately as either cats or dogs. Upon visual inspection of misclassified images, it becomes clear these misclassified images often exhibited characteristics that deviate from the typical visual cues associated with their true labels.

These errors were not arbitrary but rather hinted at certain nuances in the dataset. For instance, misclassified cat images displayed poses and features more reminiscent of dogs, and vice versa.

In particular, we note that Chihuahuas (misclassified 5 times) and Sphynx cats (misclassified 5 times) are most likely to be misclassified as they have characteristics that are very distinct from the rest of their class.

In [44]:
import numpy as np
import matplotlib.pyplot as plt

# Assuming you have a trained model named 'model' and validation data named 'X_val' and 'y_val'

# Step 1: Make predictions on the validation data
predictions = model_resnet50.predict(np.array(X_val))
predictions = np.array([0 if k < 0.5 else 1 for k in predictions])

# Step 2: Identify misclassified samples
misclassified_indices = predictions != y_val
misclassified_images = np.array(X_val)[misclassified_indices]
true_labels = np.array(y_val)[misclassified_indices]
predicted_labels = predictions[misclassified_indices]
# Step 3: Visualize misclassified images with increased space between subplots
# Plot 10 images
num_images_to_plot = 29
plt.figure(figsize=(15, 25))
for i in range(num_images_to_plot):
    plt.subplot(6, 5, i + 1)
    plt.imshow(np.reshape(misclassified_images[i], (224,224,3)))
    plt.title(f"True Label: {true_labels[i]}; Predicted: {predicted_labels[i]}")
    plt.axis('off')

plt.suptitle('Misclassified Images on Validation Data', y=1.05,fontsize=20)
plt.tight_layout()
plt.show()
57/57 [==============================] - 3s 51ms/step

The confusion matrix shows how model is more likely to recognize images of dogs as cats than vice versa. Examining the data set it

In [45]:
import seaborn as sns
# Step 4: Analyze misclassification patterns
misclassification_analysis = {}
for true_label, predicted_label in zip(true_labels, predicted_labels):
    if true_label not in misclassification_analysis:
        misclassification_analysis[true_label] = {predicted_label: 1}
    else:
        if predicted_label not in misclassification_analysis[true_label]:
            misclassification_analysis[true_label][predicted_label] = 1
        else:
            misclassification_analysis[true_label][predicted_label] += 1
            
# Print misclassification analysis
print("Misclassification Analysis:")
for true_label, predictions in misclassification_analysis.items():
    print(f"True Label {true_label}: {predictions}")

# Visualize misclassification patterns
fig, ax = plt.subplots(figsize=(10, 6))
heatmap_data = np.zeros((2,2))
for true_label, predictions in misclassification_analysis.items():
    for predicted_label, count in predictions.items():
        heatmap_data[true_label, predicted_label] = count

sns.heatmap(heatmap_data, annot=True, fmt='g', cmap='viridis', cbar=True, ax=ax)
ax.set_xlabel('Predicted Label')
ax.set_ylabel('True Label')
ax.set_title('Misclassification Patterns')
plt.show()
Misclassification Analysis:
True Label 1: {0: 18}
True Label 0: {1: 11}

Future Work¶

While the current implementation provides a robust foundation for cat vs. dog image classification, there are several avenues for future exploration and enhancement:

  1. Fine-Tuning and Hyperparameter Optimization: Experiment with fine-tuning the hyperparameters of the existing model or exploring different pre-trained models to improve overall classification accuracy.
  2. Ensemble Learning: Investigate ensemble learning techniques by combining predictions from multiple models, potentially leveraging diverse architectures or training strategies.
  3. Transfer Learning for Breed Classification: Extend the project to include a specialized model for breed classification.
  4. Data Augmentation: Implement additional data augmentation techniques to enhance the model's ability to recognize diverse characteristics associated with different classes.